AITopics | fine-tuning language model

Brain-Informed Fine-Tuning for Improved Multilingual Understanding in Language Models

Neural Information Processing SystemsJun-14-2026, 06:01:50 GMT

Recent studies have demonstrated that fine-tuning language models with brain data can improve their semantic understanding, although these findings have so far been limited to English. Interestingly, similar to the shared multilingual embedding space of pretrained multilingual language models, human studies provide strong evidence for a shared semantic system in bilingual individuals. Here, we investigate whether fine-tuning language models with bilingual brain data changes model representations in a way that improves them across multiple languages. To test this, we fine-tune monolingual and multilingual language models using brain activity recorded while bilingual participants read stories in English and Chinese. We then evaluate how well these representations generalize to the bilingual participants' first language, their second language, and several other languages that the participants are not fluent in. We assess the fine-tuned language models on brain encoding performance and downstream NLP tasks. Our results show that bilingual brain-informed fine-tuned language models outperform their vanilla (pretrained) counterparts in both brain encoding performance and most downstream NLP tasks across multiple languages. These findings suggest that brain-informed fine-tuning improves multilingual understanding in language models, offering a bridge between cognitive neuroscience and NLP research. We make our code publicly available.

artificial intelligence, language model, natural language, (9 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.97)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Fine-Tuning Language Models with Just Forward Passes

Neural Information Processing SystemsDec-26-2025, 12:20:20 GMT

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory. Zeroth-order (ZO) methods can in principle estimate gradients using only two forward passes but are theorized to be catastrophically slow for optimizing large models. In this work, we propose a memory-efficient zerothorder optimizer (MeZO), adapting the classical ZO-SGD method to operate in-place, thereby fine-tuning LMs with the same memory footprint as inference. For example, with a single A100 80GB GPU, MeZO can train a 30-billion parameter model, whereas fine-tuning with backpropagation can train only a 2.7B LM with the same budget. We conduct comprehensive experiments across model types (masked and autoregressive LMs), model scales (up to 66B), and downstream tasks (classification, multiple-choice, and generation). Our results demonstrate that (1) MeZO significantly outperforms in-context learning and linear probing; (2) MeZO achieves comparable performance to fine-tuning with backpropagation across multiple tasks, with up to 12 memory reduction and up to 2 GPU-hour reduction in our implementation; (3) MeZO is compatible with both full-parameter and parameter-efficient tuning techniques such as LoRA and prefix tuning; (4) MeZO can effectively optimize non-differentiable objectives (e.g., maximizing accuracy or F1). We support our empirical findings with theoretical insights, highlighting how adequate pre-training and task prompts enable MeZO to fine-tune huge models, despite classical ZO analyses suggesting otherwise.

fine-tuning language model, just forward pass, name change, (5 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Fine-tuning language models to find agreement among humans with diverse preferences

Neural Information Processing SystemsDec-25-2025, 18:27:51 GMT

Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a single generic user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., should we raise taxes on the rich?), and rate the LLM's generated candidate consensus statements for agreement and quality.

consensus statement, find agreement, fine-tuning language model, (5 more...)

Neural Information Processing Systems

Industry: Government (0.58)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.76)

Add feedback

Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees

Neural Information Processing SystemsDec-24-2025, 13:06:01 GMT

Communication compression is a crucial technique for modern distributed learning systems to alleviate their communication bottlenecks over slower networks. Despite recent intensive studies of gradient compression for data parallel-style training, compressing the activations for models trained with pipeline parallelism is still an open problem. In this paper, we propose AQ-SGD, a novel activation compression algorithm for communication-efficient pipeline parallelism training over slow networks.

activation quantization, compression, fine-tuning language model, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)

Add feedback

On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting

Neural Information Processing SystemsDec-24-2025, 09:16:55 GMT

The availability of large pre-trained models is changing the landscape of Machine Learning research and practice, moving from a training from scratch to a fine-tuning'' paradigm. While in some applications the goal is to nudge'' the pre-trained distribution towards preferred outputs, in others it is to steer it towards a different distribution over the sample space. Two main paradigms have emerged to tackle this challenge: Reward Maximization (RM) and, more recently, Distribution Matching (DM). RM applies standard Reinforcement Learning (RL) techniques, such as Policy Gradients, to gradually increase the reward signal. DM prescribes to first make explicit the target distribution that the model is fine-tuned to approximate.

fine-tuning language model, name change, reinforcement learning and distribution matching, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.79)

Add feedback

Devstral: Fine-tuning Language Models for Coding Agent Applications

Rastogi, Abhinav, Yang, Adam, Jiang, Albert Q., Liu, Alexander H., Sablayrolles, Alexandre, Héliou, Amélie, Martin, Amélie, Agarwal, Anmol, Ehrenberg, Andy, Lo, Andy, Roux, Antoine, Darcet, Arthur, Mensch, Arthur, Bout, Baptiste, Rozière, Baptiste, De Monicault, Baudouin, Bamford, Chris, Wallenwein, Christian, Renaudin, Christophe, Lanfranchi, Clémence, Denoix, Clément, Barreau, Corentin, Mizelle, Darius Dabert Devon, Casas, Diego de las, Chane-Sane, Elliot, Fugier, Emilien, Hanna, Emma Bou, Berrada, Gabrielle, Delerce, Gauthier, Guinet, Gauthier, Novikov, Georgii, Neubig, Graham, Lample, Guillaume, Martin, Guillaume, Jaju, Himanshu, Ludziejewski, Jan, Rute, Jason, Delignon, Jean-Malo, Chabran, JeanHadrien, Studnia, Joachim, Barmentlo, Joep, Amar, Jonas, Roberts, Josselin Somerville, Denize, Julien, Saxena, Karan, Yadav, Karmesh, Khandelwal, Kartik, Chandu, Khyathi Raghavi, Jain, Kush, Lavaud, Lélio Renard, Blier, Léonard, Zhao, Lingxiao, Martin, Louis, Saulnier, Lucile, Gao, Luyu, Pellat, Marie, Guillaumin, Mathilde, Felardos, Mathis, Dinot, Matthieu, Darrin, Maxime, Augustin, Maximilian, Seznec, Mickaël, Gupta, Neha, Raghuraman, Nikhil, Duchenne, Olivier, Wang, Patricia, von Platen, Patrick, Saffer, Patryk, Jacob, Paul, Wambergue, Paul, Kurylowicz, Paula, Chagniot, Philomène, Stock, Pierre, Agrawal, Pravesh, Delacourt, Rémi, Soletskyi, Roman, Sauvestre, Romain, Vaze, Sagar, Gandhi, Sanchit, Subramanian, Sandeep, Dalal, Shashwat, Gandhi, Siddharth, Ghosh, Soham, Mishra, Srijan, Aithal, Sumukh, Antoniak, Szymon, Scao, Teven Le, Lavril, Thibaut, Schueller, Thibault, Foubert, Thomas, Robert, Thomas, Wang, Thomas, Lacroix, Timothée, Bewley, Tom, Nemychnikova, Valeriia, Paltz, Victor, Richard, Virgile, Li, Wen-Ding, Marshall, William, Wang, Xingyao, Zhang, Xuanyu, Wan, Yihan, Tang, Yunhao

arXiv.org Artificial IntelligenceOct-1-2025

We introduce Devstral-Small, a lightweight open source model for code agents with the best performance among models below 100B size. In this technical report, we give an overview of how we design and develop a model and craft specializations in agentic software development. The resulting model, Devstral-Small is a small 24B model, fast and easy to serve. Despite its size, Devstral-Small still attains competitive performance compared to models more than an order of magnitude larger.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.25193

Country: Europe > Austria > Vienna (0.14)

Genre:

Research Report (0.84)
Overview (0.75)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective

Neural Information Processing SystemsMay-27-2025, 21:57:11 GMT

The two-stage fine-tuning (FT) method, linear probing (LP) then fine-tuning (LP-FT), outperforms linear probing and FT alone. This holds true for both in-distribution (ID) and out-of-distribution (OOD) data. One key reason for its success is the preservation of pre-trained features, achieved by obtaining a near-optimal linear head during LP. However, despite the widespread use of large language models, there has been limited exploration of more complex architectures such as Transformers. In this paper, we analyze the training dynamics of LP-FT for classification tasks on the basis of the neural tangent kernel (NTK) theory.

fine-tuning language model, linear head norm, ntk perspective, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.41)

Add feedback

Fine-Tuning Language Models with Just Forward Passes

Neural Information Processing SystemsMay-27-2025, 07:19:36 GMT

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory. Zeroth-order (ZO) methods can in principle estimate gradients using only two forward passes but are theorized to be catastrophically slow for optimizing large models. In this work, we propose a memory-efficient zerothorder optimizer (MeZO), adapting the classical ZO-SGD method to operate in-place, thereby fine-tuning LMs with the same memory footprint as inference. For example, with a single A100 80GB GPU, MeZO can train a 30-billion parameter model, whereas fine-tuning with backpropagation can train only a 2.7B LM with the same budget. We conduct comprehensive experiments across model types (masked and autoregressive LMs), model scales (up to 66B), and downstream tasks (classification, multiple-choice, and generation).

artificial intelligence, machine learning, natural language, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.86)
Information Technology > Artificial Intelligence > Natural Language (0.64)

Add feedback

Fine-tuning Language Models for Recipe Generation: A Comparative Analysis and Benchmark Study

Vij, Anneketh, Liu, Changhao, Nair, Rahul Anil, Ho, Theodore Eugene, Shi, Edward, Bhowmick, Ayan

arXiv.org Artificial IntelligenceFeb-16-2025

This research presents an exploration and study of the recipe generation task by fine-tuning various very small language models, with a focus on developing robust evaluation metrics and comparing across different language models the open-ended task of recipe generation. This study presents extensive experiments with multiple model architectures, ranging from T5-small (Raffel et al., 2023) and SmolLM-135M(Allal et al., 2024) to Phi-2 (Research, 2023), implementing both traditional NLP metrics and custom domain-specific evaluation metrics. Our novel evaluation framework incorporates recipe-specific metrics for assessing content quality and introduces approaches to allergen substitution. The results indicate that, while larger models generally perform better on standard metrics, the relationship between model size and recipe quality is more nuanced when considering domain-specific metrics. SmolLM-360M and SmolLM-1.7B demonstrate comparable performance despite their size difference before and after fine-tuning, while fine-tuning Phi-2 shows notable limitations in recipe generation despite its larger parameter count. The comprehensive evaluation framework and allergen substitution systems provide valuable insights for future work in recipe generation and broader NLG tasks that require domain expertise and safety considerations.

artificial intelligence, comparative analysis and benchmark study, natural language, (2 more...)

arXiv.org Artificial Intelligence

2502.02028

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.80)

Add feedback

Fine-Tuning Language Models with Just Forward Passes

Neural Information Processing SystemsJan-19-2025, 18:12:35 GMT

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory. Zeroth-order (ZO) methods can in principle estimate gradients using only two forward passes but are theorized to be catastrophically slow for optimizing large models. In this work, we propose a memory-efficient zerothorder optimizer (MeZO), adapting the classical ZO-SGD method to operate in-place, thereby fine-tuning LMs with the same memory footprint as inference. For example, with a single A100 80GB GPU, MeZO can train a 30-billion parameter model, whereas fine-tuning with backpropagation can train only a 2.7B LM with the same budget. We conduct comprehensive experiments across model types (masked and autoregressive LMs), model scales (up to 66B), and downstream tasks (classification, multiple-choice, and generation).

backpropagation, fine-tuning language model, just forward pass, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.86)
Information Technology > Artificial Intelligence > Natural Language (0.64)

Add feedback

Filters

Collaborating Authors

fine-tuning language model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Brain-Informed Fine-Tuning for Improved Multilingual Understanding in Language Models

Fine-Tuning Language Models with Just Forward Passes

Fine-tuning language models to find agreement among humans with diverse preferences

Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees

On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting

Devstral: Fine-tuning Language Models for Coding Agent Applications

Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective

Fine-Tuning Language Models with Just Forward Passes

Fine-tuning Language Models for Recipe Generation: A Comparative Analysis and Benchmark Study

Fine-Tuning Language Models with Just Forward Passes